AITopics | statement 1

Collaborating Authors

statement 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convergence and Stability Analysis of Self-Consuming Generative Models with Heterogeneous Human Curation

Zhao, Hongru, Fu, Jinwen, Pham, Tuan

arXiv.org Machine LearningNov-14-2025

Contemporary pipelines largely learn from preferences, often alongside scalable-oversight efforts ("superalignment" Burns et al. (2023); Kim et al. (2024); Köpf et al. (2023)), and a growing survey literature maps the practical trade-offs--from data collection and reward inference to evaluation and safety (e.g., Shen et al., 2023; Kaufmann et al., 2025). A common structure underlies many systems: models propose alternatives, people (or proxies) compare them, and those preferences guide the next training round (Shin et al., 2023; Lee et al., 2021; Munos et al., 2024). Within this landscape, two families dominate. Reinforcement Learning from Human Feedback (RLHF) first trains a reward model from comparisons, then improves the policy via reinforcement learning with KL regularization (typically Proximal Policy Optimization (PPO)). This accommodates rich, sequence-level signals, but it introduces extra moving parts--reward modeling, on-policy sampling, and tuning--that can make training complex and sometimes unstable at scale (Kirk et al., 2023).

machine learning, natural language, regime, (15 more...)

arXiv.org Machine Learning

2511.09002

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SMAGDi: Socratic Multi Agent Interaction Graph Distillation for Efficient High Accuracy Reasoning

Aluru, Aayush, Malik, Myra, Patankar, Samarth, Kim, Spencer, Zhu, Kevin, O'Brien, Sean, Sharma, Vasu

arXiv.org Artificial IntelligenceNov-11-2025

Multi-agent systems (MAS) often achieve higher reasoning accuracy than single models, but their reliance on repeated debates across agents makes them computationally expensive. We introduce SMAGDi, a distillation framework that transfers the debate dynamics of a five-agent Llama-based MAS into a compact Socratic decomposer-solver student. SMAGDi represents debate traces as directed interaction graphs, where nodes encode intermediate reasoning steps with correctness labels and edges capture continuity and cross-agent influence. The student is trained with a composite objective combining language modeling, graph-based supervision, contrastive reasoning, and embedding alignment to preserve both fluency and structured reasoning. On StrategyQA and MMLU, SMAGDi compresses a 40B multi-agent system into a 6B student while retaining 88% of its accuracy, substantially outperforming prior distillation methods such as MAGDi, standard KD, and fine-tuned baselines. These results highlight that explicitly modeling interaction graphs and Socratic decomposition enable small models to inherit the accuracy benefits of multi-agent debate while remaining efficient enough for real-world deployment.

agent, artificial intelligence, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2511.05528

Country:

Asia (1.00)
North America > United States (0.28)

Genre: Research Report (0.84)

Industry: Law (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Towards Problem-dependent Optimal Learning Rates

Neural Information Processing SystemsOct-2-2025, 06:12:20 GMT

We study problem-dependent rates, i.e., generalization errors that scale tightly with the variance or the effective loss at the "best hypothesis."

artificial intelligence, estimator, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Zero-shot and Few-shot Learning with Instruction-following LLMs for Claim Matching in Automated Fact-checking

Pisarevskaya, Dina, Zubiaga, Arkaitz

arXiv.org Artificial IntelligenceJan-18-2025

The claim matching (CM) task can benefit an automated fact-checking pipeline by putting together claims that can be resolved with the same fact-check. In this work, we are the first to explore zero-shot and few-shot learning approaches to the task. We consider CM as a binary classification task and experiment with a set of instruction-following large language models (GPT-3.5-turbo, Gemini-1.5-flash, Mistral-7B-Instruct, and Llama-3-8B-Instruct), investigating prompt templates. We introduce a new CM dataset, ClaimMatch, which will be released upon acceptance. We put LLMs to the test in the CM task and find that it can be tackled by leveraging more mature yet similar tasks such as natural language inference or paraphrase detection. We also propose a pipeline for CM, which we evaluate on texts of different lengths.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.1086

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Asia > Middle East > Iran (0.04)
(18 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

Tang, Xiangru, Hu, Tianyu, Ye, Muyang, Shao, Yanjun, Yin, Xunjian, Ouyang, Siru, Zhou, Wangchunshu, Lu, Pan, Zhang, Zhuosheng, Zhao, Yilun, Cohan, Arman, Gerstein, Mark

arXiv.org Artificial IntelligenceJan-11-2025

Chemical reasoning usually involves complex, multi-step processes that demand precise calculations, where even minor errors can lead to cascading failures. Furthermore, large language models (LLMs) encounter difficulties handling domain-specific formulas, executing reasoning steps accurately, and integrating code effectively when tackling chemical reasoning tasks. To address these challenges, we present ChemAgent, a novel framework designed to improve the performance of LLMs through a dynamic, self-updating library. This library is developed by decomposing chemical tasks into sub-tasks and compiling these sub-tasks into a structured collection that can be referenced for future queries. Then, when presented with a new problem, ChemAgent retrieves and refines pertinent information from the library, which we call memory, facilitating effective task decomposition and the generation of solutions. Our method designs three types of memory and a library-enhanced reasoning component, enabling LLMs to improve over time through experience. Experimental results on four chemical reasoning datasets from SciBench demonstrate that ChemAgent achieves performance gains of up to 46% (GPT-4), significantly outperforming existing methods. Our findings suggest substantial potential for future applications, including tasks such as drug discovery and materials science. Our code can be found at https://github.com/gersteinlab/chemagent

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.0659

Country: North America (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning

Kim, Seunghee, Kim, Changhyeon, Kim, Taeuk

arXiv.org Artificial IntelligenceDec-17-2024

Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of complex queries that necessitate operations across more than two modalities, hindering accurate performance assessment. To address this, we present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark created to analyze the reasoning capabilities of MLLMs by urging them to combine information from textual reports, tables, and charts within the financial domain. FCMR is categorized into three difficulty levels-Easy, Medium, and Hard-facilitating a step-by-step evaluation. In particular, problems at the Hard level require precise cross-modal three-hop reasoning and are designed to prevent the disregard of any modality. Experiments on this new benchmark reveal that even state-of-the-art MLLMs struggle, with the best-performing model (Claude 3.5 Sonnet) achieving only 30.4% accuracy on the most challenging tier. We also conduct analysis to provide insights into the inner workings of the models, including the discovery of a critical bottleneck in the information retrieval phase.

modality, reasoning, statement 1, (14 more...)

arXiv.org Artificial Intelligence

2412.12567

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Hockey (1.00)
Government (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

Chen, Weize, Yuan, Chenfei, Yuan, Jiarui, Su, Yusheng, Qian, Chen, Yang, Cheng, Xie, Ruobing, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceJun-18-2024

Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reasoning and multi-agent communication, has not been thoroughly examined. In this work, we challenge the default use of NL by exploring the utility of non-NL formats in these contexts. We show that allowing LLMs to autonomously select the most suitable format before reasoning or communicating leads to a 3.3 to 5.7\% improvement in reasoning efficiency for different LLMs, and up to a 72.7\% reduction in token usage in multi-agent communication, all while maintaining communicative effectiveness. Our comprehensive analysis further reveals that LLMs can devise a format from limited task instructions and that the devised format is effectively transferable across different LLMs. Intriguingly, the structured communication format decided by LLMs exhibits notable parallels with established agent communication languages, suggesting a natural evolution towards efficient, structured communication in agent communication. Our code is released at \url{https://github.com/thunlp/AutoForm}.

gpt-3, information, statement 1, (12 more...)

arXiv.org Artificial Intelligence

2402.18439

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > South Korea (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Media (0.93)
Health & Medicine (0.92)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Mündler, Niels, He, Jingxuan, Jenko, Slobodan, Vechev, Martin

arXiv.org Artificial IntelligenceOct-1-2023

Large language models (large LMs) are susceptible to producing text that contains hallucinated content. An important instance of this problem is self-contradiction, where the LM generates two contradictory sentences within the same context. In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. Our analysis reveals the prevalence of self-contradictions when LMs generate text for open-domain topics, e.g., in 17.7% of all sentences produced by ChatGPT. Self-contradiction also complements retrieval-based methods, as a large portion of them (e.g., 35.8% for ChatGPT) cannot be verified using Wikipedia. We then propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions. Our detector achieves high accuracy, e.g., around 80% F1 score when prompting ChatGPT. The mitigation algorithm iteratively refines the generated text to remove contradictory information while preserving text fluency and informativeness. Importantly, our entire framework is applicable to black-box LMs and does not require external grounded knowledge. Our approach is practically effective and has been released as a push-button tool to benefit the public, available at https://chatprotect.ai/.

alm, chatgpt, freeman, (15 more...)

arXiv.org Artificial Intelligence

2305.15852

Country:

North America > Cuba (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Nebraska (0.04)
(29 more...)

Genre:

Personal (1.00)
Research Report (0.65)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports (1.00)
Media > Music (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression

Butakov, Ivan, Tolmachev, Aleksander, Malanchuk, Sofia, Neopryatnaya, Anna, Frolov, Alexey, Andreev, Kirill

arXiv.org Artificial IntelligenceMay-13-2023

The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs). Its essence lies in tracking the dynamics of two mutual information (MI) values: one between the hidden layer and the class label, and the other between the hidden layer and the DNN input. According to the hypothesis put forth by Shwartz-Ziv and Tishby (2017), the training process consists of two distinct phases: fitting and compression. The latter phase is believed to account for the good generalization performance exhibited by DNNs. Due to the challenging nature of estimating MI between high-dimensional random vectors, this hypothesis has only been verified for toy NNs or specific types of NNs, such as quantized NNs and dropout NNs. In this paper, we introduce a comprehensive framework for conducting IB analysis of general NNs. Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values. Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.08013

Country:

North America > United States (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations in a Label-Abundant Setup

Yordanov, Yordan, Kocijan, Vid, Lukasiewicz, Thomas, Camburu, Oana-Maria

arXiv.org Artificial IntelligenceOct-22-2022

Training a model to provide natural language explanations (NLEs) for its predictions usually requires the acquisition of task-specific NLEs, which is time- and resource-consuming. A potential solution is the few-shot out-of-domain transfer of NLEs from a parent task with many NLEs to a child task. In this work, we examine the setup in which the child task has few NLEs but abundant labels. We establish four few-shot transfer learning methods that cover the possible fine-tuning combinations of the labels and NLEs for the parent and child tasks. We transfer explainability from a large natural language inference dataset (e-SNLI) separately to two child tasks: (1) hard cases of pronoun resolution, where we introduce the small-e-WinoGrande dataset of NLEs on top of the WinoGrande dataset, and (2)~commonsense validation (ComVE). Our results demonstrate that the parent task helps with NLE generation and we establish the best methods for this setup.

machine learning, natural language, prediction, (16 more...)

arXiv.org Artificial Intelligence

2112.06204

Country:

Pacific Ocean (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)

Add feedback